My Data Source

My analysis utilizes Georgia absentee voting data files and election results files directly downloaded from the Georgia Secretary of State’s office website. I downloaded all available absentee data spanning from 2014 to 2023. Please note that the website shows an option available in 2013, but it is not available for downloading, so I have excluded data from that year.

Historical Context and Why absentee votes are important

Originally, absentee voting in Georgia, like in much of the United States, was a provision for those unable to attend polling stations, such as military personnel and voters with disabilities.

Over time, laws have evolved to broaden the eligibility for absentee voting, reflecting a trend towards more inclusive voter participation.

Advancements in technology have simplified the absentee voting process, making it more secure and accessible to a larger portion of the population.

Absentee ballots have played crucial roles in close races, sometimes swinging the results, thereby highlighting their significance in the state’s electoral process.

The COVID-19 pandemic marked a watershed moment for absentee voting in Georgia, with health concerns driving unprecedented use of absentee ballots during the 2020 elections.

Post-2020, legislative responses such as Senate Bill 202 have sparked significant debate over the balance between electoral accessibility and security, emphasizing the ongoing relevance of absentee voting in Georgia’s political discourse.

Data Story:The Evolution of Absentee Voting in Georgia

Georgia, a state where every election cycle changes a lot.

In 2014, it introduced the Exact Match Law, a policy that requires citizens’ names on their government-issued IDs must precisely match their names as listed on the voter rolls. If the two don’t match, additional verification by a local registrar will be necessary. As well as precisely matching voter registration info with state databases. Some say it was for accuracy; others felt it was a hurdle too high for some would-be voters. The absentee ballot, once a quiet participant in elections, started to murmur with significance.

Georgia’s Exact Match Law took effect, necessitating precise alignment between voter registration details and state databases. This law was heavily scrutinized, particularly for its disproportionate impact on minority groups, which some claimed could undermine democratic institutions and potentially disenfranchise specific voter demographics

Fast forward to 2016, Presidential elections typically rouse public interest, spurring shifts in voting patterns due to the high stakes associated with electing a national leader. There’s something about choosing a president that sends ripples of excitement across the peach state, nudging more folks to reach for absentee ballots, craving a part of history in the making.

Then came 2018,November 6, 2018, concurrently with other statewide and local elections to elect the next governor of the U.S. state of Georgia. with a gubernatorial election that crackled with controversy and debate over who gets to vote and how. This wasn’t just any election; it was a spotlight on voting rights, pushing absentee voting out from the shadows as a beacon of accessibility.

The contentious gubernatorial race in Georgia, marked by accusations of voter suppression, brought the Exact Match Law back into the limelight. Reports surfaced that over 53,000 voter registrations were on hold due to minor discrepancies, affecting predominantly African American applicants—a situation some argue tipped the election in favor of the Republican candidate

In 2019, Georgia decided to blend the old with the new, ushering in sophisticated voting machines and the reassuring rustle of paper ballots. This tech leap wasn’t just a nod to security; it was a handshake with trust, possibly making those skeptical of absentee voting give it a second glance

Georgia introduced this Voting System Update — that is, new voting machines producing paper ballots, aiming to boost confidence in electoral processes. While this primarily affected in-person voting, the ripple effect potentially influenced attitudes toward absentee voting.

2020 arrived with an uninvited guest: COVID-19. The pandemic turned the world upside down, and in Georgia, The pandemic catalyzed a surge in absentee voting as safety concerns made traditional voting methods less appealing. The state responded with adjustments to accommodate the increased demand for absentee voting, like extending deadlines and relaxing rules.

Ballotpedia archived the sweeping changes across the U.S.,noting the temporary modifications to voting procedures to accommodate the health crisis. Georgia didn’t enact statewide changes like automatic mail-in ballots for all voters, but the pandemic’s effect on voting behavior was obvious. National discourse suggested that absentee ballots could sway election outcomes, as mail votes showed significant leanings toward certain candidates, highlighting the urgency of adapting electoral processes to pandemic conditions. Editorial boards and political figures advocated for expanded voting options, considering them a necessary response to the public health crisis. They argued that mail voting is a reliable precaution for elections during a pandemic, emphasizing that voter safety should take precedence without compromising the integrity of the electoral process

In the aftermath of such unprecedented times, 2021 brought with it Senate Bill 202, this legislation enacted stricter voter ID requirements for absentee ballots and limited the use of ballot drop boxes, among other changes. And with such tighter rules that some said safeguarded elections, while others argued it clipped the wings of voter freedom. The absentee voting landscape was changing yet again, under the watchful eyes of those it served.

By the 2022 midterms, Georgia was not just voting; it was testing the waters of its new legislation. Would the absentee vote hold strong or waver under the new law’s weight? So this period offered a first glimpse at the impact of Senate Bill 202 on voter behavior, with absentee voting practices under new scrutiny.

Now,we look back at a decade where absentee voting in Georgia has been a story of an evolution between policy and people’s will, always adapting, always responding.The ongoing adjustments to voting legislation and the public’s response to these changes suggest a dynamic and responsive electoral environment, continually influenced by legislative, societal, and political shifts.

Low Absentee Voting Counties: These counties, like Catoosa, Chattooga, and Lumpkin, are typically more rural with smaller populations. Lower absentee voting rates could be due to limited access to absentee voting resources, or possibly less outreach and education on absentee voting options.

Low-Medium Absentee Voting Counties: such as Barrow, Carroll, and Douglas, these represent a mix of suburban and rural areas. The absentee voting rates might reflect a slightly higher engagement with absentee voting due to some proximity to urban areas and better resource allocation.

We can see that these areas are tend to be rural and less affluent, these counties have experienced precinct closures, which, according to the AJC, often occurred in rural and impoverished areas with higher poverty rates and significant African-American populations.

https://www.ajc.com/news/state--regional-govt--politics/voting-precincts-closed-across-georgia-since-election-oversight-lifted/bBkHxptlim0Gp9pKu7dfrN/

Medium Absentee Voting Counties: This group, which includes counties like Cherokee, Columbia, and Hall, These areas may have more access to voting resources compared to rural areas, hence the moderate levels of absentee voting.

Medium-High Absentee Voting Counties: Counties such as Clarke (which includes the city of Athens), Cobb, and Muscogee (home to Columbus) suggest more urbanized areas with higher populations. These counties we see higher absentee voting due to greater awareness and the availability of voting resources.

Counties like these, likely have more diverse and higher-income populations with better access to information and resources. These counties may have been less impacted by precinct closures or may have had more resources to counteract the impact through voter education and absentee ballot promotion.

High Absentee Voting Counties: This level includes highly urbanized counties such as Fultonand DeKalb. These urban counties,despite the nationwide trend of precinct closures, the remaining precincts may still be more accessible due to a higher density of polling locations and public transportation options.

And they often have more polling places and resources, may have seen an increase in absentee voting as a proactive response to precinct closures. These counties could have robust campaigns to encourage absentee voting, mitigating the impact of fewer voting centers.

The pattern suggests that precinct closures could be a significant factor influencing the move towards absentee voting, especially in areas where in-person voting options have become limited. This has likely had a variable impact across the state, depending on local responses and the availability of resources to support absentee voting.

Low: Ben Hill, Brantley, Bryan, Butts, Candler, Catoosa, Chattooga, Coffee, Dade, Dawson, Echols, Fayette, Forsyth, Hart, Irwin, Lanier, Long, Lowndes, Lumpkin, Morgan, Oconee, Peach, Pickens, Pierce, Polk, Schley, Stephens, Thomas, Tift, Turner, Upson.

Low-Medium: Atkinson, Bacon, Barrow, Bartow, Ben Hill, Carroll, Charlton, Chattahoochee, Clinch, Colquitt, Crawford, Crisp, Decatur, Douglas, Effingham, Floyd, Gordon, Grady, Henry, Houston, Jackson, Jeff Davis, Macon, Miller, Murray, Pulaski, Rockdale, Screven, Seminole, Treutlen, Wayne, Worth.

Medium: Baldwin, Banks, Bleckley, Cherokee, Columbia, Cook, Dooly, Emanuel, Evans, Franklin, Gilmer, Greene, Hall, Harris, Jasper, Johnson, Liberty, Monroe, Paulding, Pike, Putnam, Quitman, Spalding, Towns, Troup, Union, Walker, Ware, Warren, Wheeler, Whitfield, Wilkes.

Medium-High: Berrien, Bulloch, Clarke, Clay, Clayton, Cobb, Coweta, Dodge, Elbert, Fannin, Glascock, Glynn, Gwinnett, Habersham, Haralson, Jones, Lamar, Lee, Madison, McDuffie, McIntosh, Meriwether, Muscogee, Newton, Oglethorpe, Randolph, Tattnall, Taylor, Toombs, Walton, White, Wilcox.

High: Appling, Baker, Bibb, Brooks, Burke, Calhoun, Camden, Chatham, DeKalb, Dougherty, Early, Fulton, Hancock, Heard, Jefferson, Jenkins, Laurens, Lincoln, Marion, Mitchell, Montgomery, Rabun, Richmond, Stewart, Sumter, Talbot, Taliaferro, Telfair, Terrell, Twiggs, Washington, Webster, Wilkinson.

These categorizations are based on the average percentage of absentee votes relative to the total votes in each county over the years from 2014 to 2023.

Data Prepration – Examine the Data Structure

When examining the files, a typical CSV file for an absentee voting record is named with the date and month. Each month’s folder contains data for all or fewer counties in Georgia. Each county’s CSV file contains numerous columns, including:

  • County name
  • Voter name
  • Voter address
  • Ballot status
  • Ballot style
  • Ballot return date
  • Ballot issued date

For each CSV file, I have executed the following data preprocessing steps using R code:

-Extracted the county name.

-Filtered out empty rows based on the “Ballot Return Date” column, which indicates all valid ballot votes that have been successfully counted. This step provides the total valid vote count for each county in a specific month and year. I counted this column and summed it up for each CSV file, then aggregated these sums for all CSV files in each folder to determine the total valid votes for each year.

-Filtered rows based on the “Ballot Style” column to exclude those indicating “IN PERSON.” This process isolates valid absentee votes, including mail-in or electronic votes. I followed the same procedure of summing up all the valid absentee votes for each CSV, to calculate the total valid absentee votes for each year in Georgia.

Data vetting

Source of Data: This is a primary source, as it comes directly from the Georgia Secretary of State’s office. And it is used from many articles and election reports, and reputable organization such as the Alanta Journal.

Time Period: This website has absentee data files from 2013 - 2013, but 2013 is not downloadable. So time period in this invesigation data story will be using data from 2014 - 2023.

Number of Records: estimated total number of records in the Georgia Absentee Voter Records database, spanning from 2014 to 2023, is approximately 314,720

Duplicates: After sampling several CSV files from different years and counties, I found no duplicate entries in terms of voter names and addresses. This suggests that the dataset may have a high degree of integrity in this aspect. However, this is a preliminary finding based on a limited sample and should be considered an estimate rather than a conclusive result

Consistency Issues: I did a sample check on county names and addresses etc. This preliminary check revealed no significant inconsistencies in the spelling or formatting of these fields.However, as with any large data set, there might be minor discrepancies not captured in the sample.

Numeric Fields: Dates align with the relevant election cycles and absentee voting periods, and vote counts appear reasonable, without any outliers suggesting data entry errors. However, this is based on a limited sample and assumes the data set’s overall integrity. For a comprehensive verification, a full-scale analysis using statistical tools would be needed.

Missing Data: After some examination, it reveals minimal instances of missing data. Key fields such as voter names, addresses, and ballot information are predominantly complete.

Questions for Clarification: I’d raise include, “What are the protocols for data entry and verification by the Georgia Secretary of State’s office?”

“Are there any known limitations or biases in the absentee voting data collection process?”

Key Findings: The initial analysis indicates a rise in absentee voting over recent years, marked by considerable variations among different counties. These trends could be shaped by changes in voting laws, demographic shifts, and societal or technological changes.

Data Reproducibility

If you would like to reproduce the analysis presented in this document, you can follow these steps. Please note that the code below is set not to run automatically when you knit this document to HTML. You need to manually execute it if you want to reproduce the results.

Prerequisites:

  1. Ensure that you have R and RStudio installed on your computer.
  2. Download the data files from [source URL] and place them in a directory on your desktop named “absentee.”

Instructions:

  1. Copy and paste the following code chunk into your R environment.

Path to the main directory on your desktop

this code is for getting Yearly_Absentee_Votes Rate_Summary. csv file.
main_dir <- "~/Desktop/absentee" 

Years to process

years <- 2014:2023

Initialize a data frame for the yearly results

yearly_results <- data.frame(Year = integer(), AbsenteeVotes = numeric(), stringsAsFactors = FALSE)

Process each year

for(year in years) {
    year_dir <- file.path(main_dir, as.character(year))
    # Initialize a variable for summing votes for the current year
    total_votes_year = 0
    # Initialize a variable for summing not in person votes for the current year
    total_votes_year_notinperson = 0
    # Check if the year directory exists to avoid errors
    if (dir.exists(year_dir)) {
        # List all sub-folders within the year folder
        month_folders <- list.dirs(year_dir, full.names = TRUE, recursive = FALSE)
        # Filter only sub-folders that contain CSV files
        month_folders <- month_folders[sapply(month_folders, function(folder) any(file_ext(list.files(folder)) == "csv"))]
        for(month_folder in month_folders) {
            # List of CSV files in the current month folder
            file_list <- list.files(month_folder, pattern = "\\.csv$", full.names = TRUE)
            
            # Process each file in the month folder
            for(file_name in file_list) {
                # Read the CSV file
                data <- read.csv(file_name, header = TRUE)

                # Count the absentee votes (non-empty 'Ballot Return Date')
                total_votes <- sum(!is.na(data$`Ballot.Return.Date`))
                # Count the absentee votes if the 'Ballot.Style' is not 'IN PERSON'
                absentee_votes <- sum(data$`Ballot.Style` != "IN PERSON" & !is.na(data$`Ballot.Return.Date`))
                
                if (is.na(total_votes)) {
                    total_votes <- 0
                }
                if (is.na(absentee_votes)) {
                    absentee_votes <- 0
                }

                # Accumulate the total votes for the year
                total_votes_year <- total_votes_year + total_votes
                total_votes_year_notinperson <- total_votes_year_notinperson + absentee_votes
            }
        }
        
        # Add the year and its total votes to the yearly results
        yearly_results <- rbind(yearly_results, data.frame(Year = year, Total.Votes = total_votes_year, Absentee.Votes = total_votes_year_notinperson))
    }
  }

write.csv(yearly_results, file.path(main_dir, "Yearly_Absentee_Votes_Summary.csv"), row.names = FALSE)


this code is for getting Yearly_Absentee_Votes_Summary By County. csv file**

{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

Path to the main directory on your desktop

main_dir <- "~/Desktop/absentee" 

Years to process

years <- 2014:2023

Initialize a data frame for the yearly results

yearly_results <- data.frame(Year = integer(), County = character(), Total.Votes = numeric(), Absentee.Votes = numeric(), stringsAsFactors = FALSE)

Process each year

for (year in years) {
  #Set the directory to the current year 's folder
  year_dir <- file.path(main_dir, as.character(year))

  # dictionary for county votes
  county_votes <- list()

  # Check if the year directory exists to avoid errors
  if (dir.exists(year_dir)) {
    # List all sub - folders within the year folder
    month_folders <- list.dirs(year_dir, full.names = TRUE, recursive = FALSE)

    # Filter only sub - folders that contain CSV files
    month_folders <- month_folders[sapply(month_folders, function (folder) any(file_ext(list.files(folder)) == "csv"))]

    for (month_folder in month_folders) {
      # List of CSV files in the current month folder
      file_list <- list.files(month_folder, pattern = "\\.csv$", full.names = TRUE)

      # Process each file in the month folder
      for (file_name in file_list) {
        # Read the CSV file
        data <- read.csv(file_name, header = TRUE)
        file_name_sub <- substr(file_name, nchar(file_name) - 15, nchar(file_name))
        # get the unique county name and add to the dictionary
        county_name_list <- unique(data$`County`)
        # remove the ""
        element in the list
        county_name_list <- county_name_list[county_name_list != ""]

        for (name in county_name_list) {
          if (!(name % in % names(county_votes))) {
            # 1 st element is total votes, 2n d element is absentee votes
            county_votes[[name]] <- list(0, 0)
          }
        }

        for (name in county_name_list) {
          # Count the absentee votes(non - empty 'Ballot Return Date')
          total_votes <- sum(!is.na(data$`Ballot.Return.Date`) & data$`County` == name)
          total_votes <- ifelse(is.na(total_votes), 0, total_votes)
          # Count the absentee votes if the 'Ballot.Style'is not 'IN PERSON'
          absentee_votes <- sum(data$`Ballot.Style` != "IN PERSON" & !is.na(data$`Ballot.Return.Date`) & data$`County` == name)
          absentee_votes <- ifelse(is.na(absentee_votes), 0, absentee_votes)

          county_votes[[name]][[1]] <- county_votes[[name]][[1]] + total_votes
          county_votes[[name]][[2]] <- county_votes[[name]][[2]] + absentee_votes
        }
      }
    }

    # Loop through keys in the 'county_votes' dictionary
    for (county_name in names(county_votes)) {
      # Create a new data frame
      for the current county with correct column names
      county_data <- data.frame(
        Year = year,
        County = county_name,
        Total.Votes = county_votes[[county_name]][[1]][1],
        Absentee.Votes = county_votes[[county_name]][[2]][1])
      # Append the county data to the yearly results
      yearly_results <- rbind(yearly_results, county_data)
    }
  }
}

Write the yearly results to a new CSV file

write.csv(yearly_results, file.path(main_dir, "Yearly_Absentee_Votes_Summary By County.csv"), row.names = FALSE)

```